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Abstract 

This note presents sharp inequahties for deviation probability of a gen- 
eral quadratic form of a random vector ^ with finite exponential mo- 
ments. The obtained deviation bounds are similar to the case of a Gaus- 
sian random vector. The results are stated under general conditions and 
do not suppose any special structure of the vector ^ . The obtained 
bounds are exact (non- asymptotic), all constants are explicit and the 
leading terms in the bounds are sharp. 
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1 Introduction 

This paper presents a number of deviation probability bounds for a quadratic form 
or more generally of a random p vector ^ satisfying a general exponential 

moment condition. Such quadratic forms arise in many problems. We mainly focus on 
statistical applications such that hypothesis testing for linear models or linear model 
selection. We refer to Massart (2007) for an extensive overview and numerous results on 

*The author is partially supported by Laboratory for Structural Methods of Data Analysis in Pre- 
dictive Modeling, MIPT, RF government grant, ag. 11.G34.31.0073. Financial support by the German 
Research Foundation (DFG) through the Collaborative Research Center 649 "Economic Risk" is grate- 
fully acknowledged 



1 



2 



SHARP DEVIATION BOUNDS FOR QUADRATIC FORMS 



probability bounds and their applications in statistical model selection. Limit theorems 
for quadratic forms can be found e.g. in Gotze and Tikhomirov (1999) and Horvath and 
Shao (1999). Some concentration bounds for U-statistics are available in Bretagnolle 
(1999), Gine et al. (2000), Houdre and Reynaud-Bouret (2003). We also refer to Baraud 
(2010) for a number of statistical problems relying on such deviation bounds. 

If ^ is standard normal then is chi-squared with p degrees of freedom. We 

aim to extend this behavior to the case of a general vector ^ satisfying the following 
exponential moment condition: 

logiBexp(7"^^) < II7IIV2, 7 G WlW < g- (1-1) 

Here g is a positive constant which appears to be very important in our results. Namely, 
it determines the frontier between the Gaussian and non-Gaussian type deviation bounds. 
Our first result shows that under (1.1) the deviation bounds for the quadratic form 
are essentially the same as in the Gaussian case, if the value exceed Cp for a fixed 
constant C . Further we extend the result to the case of a more general form ||iB^|p . An 
important advantage of the approach of this paper which differs it from all the previous 
studies is that there is no any additional conditions on the structure or origin of the 
vector ^ . For instance, we do not assume that ^ is a sum of independent or weakly 
dependent random variables, or components of ^ are independent. The results are exact 
stated in a non-asymptotic fashion, all the constants are explicit and the leading terms 
are sharp. 

As a motivating example, we consider a linear regression model Y = -|- e in 
which the error vector e is zero mean. The ordinary least square estimator 6 for the 
parameter vector 6 reads as 

= (!p'i^'T)"Vl^ 

and it can be viewed as the maximum likelihood estimator in a Gaussian linear model 
with a diagonal covariance matrix, that is, 1^ ~ 3sf(lP'^0, a'^In) ■ Define the p x p matrix 

T-,2 def T T T 

Then 

Doie-e*) = D^\ 

with ^ '= . The likelihood ratio test statistic for this problem is exactly ||Z)^"'^ClP/2 • 
Similarly, the model selection procedure is based on comparing such quadratic forms for 
different matrices Dq ; see e.g. Baraud (2010). 
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Now we indicate how this situation can be reduced to a bound for a vector ^ satisfying 
the condition (1.1). Suppose for simphcity that the errors £i are independent and have 
exponential moments. 

(ei) There exist some constants uq and gi > , and for every i a constant 5i such 
that ]E{ei/5i)^ < 1 and 

logiEexp(Aei/si) < i/^AV2, |A| < gi. (1.2) 

Here gi is a fixed positive constant. One can show that if this condition is fulfihed 
for some gi > and a constant z^o ^ 1 > then one can get a similar condition with vq 
arbitrary close to one and gi slightly decreased. A natural candidate for 5j is ai where 
af = Eel is variance of ei . Under (1.2), introduce a p x p matrix Vq defined by 



Define also 

.r-l/2 dcf 5»l^r7l 

I\ ' = max sup . 

Simple calculation shows that for ||7|| < g = giA^^^^ 

logiE;exp(7^^) < ul\\jf/2, j G MF, WjW < g. 

We conclude that (1.1) is nearly fulfilled under (ei) and moreover, the value g^ is 
proportional to the effective sample size . The results of the paper allow to get a 
nearly "behavior of the test statistic which is a finite sample version of the 

famous Wilks phenomenon; see e.g. Fan et al. (2001); Fan and Huang (2005), Boucheron 
and Massart (2011). 

The paper is organized as follows. Section 2 reminds the classical results about 
deviation probability of a Gaussian quadratic form. These results are presented only for 
comparison and to make the paper selfcontained. 

Section 3 studies the probability of the form J'dl^H > y) under the condition 

logiEexp(7'r^) < z/2||7||2/2, 7 G SF, < g. 
The general case can be reduced to z^o = 1 by rescaling ^ and g : 

log Kexp (7^^/1.0) < II7IIV2, 7 e SIP, II7II < uog 
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that is, i^Q $ fulfills (1.1) with a slightly increased g. 

The result is extended to the case of a general quadratic form in Section 4. Some 
more extension motivated by different statistical problems are given in Section 6 and 
Section 7. All the proofs are collected in the Appendix. 

2 Gaussian case 

Our benchmark will be a deviation bound for for a standard Gaussian vector ^. 

The ultimate goal is to show that under (1.1) the norm of the vector ^ exhibits behavior 
expected for a Gaussian vector, at least in the region of moderate deviations. For the 
reason of comparison, we begin by stating the result for a Gaussian vector ^ . 

Theorem 2.1. Let ^ be a standard normal vector in . Then for any u > , it holds 

]P{Uf>P + u) < exp{-(p/2)</)(n/p)]} 

with 

</.(*) 1^^-log(l + t). 
Let stand for the inverse of (j){-) . For any x , 

P{m? >P + r\2x/p)) < exp(-x). 
This particularly yields with x = 6.6 

^(ll^lP > P + \/>?xp V (xx)) < exp(-x). 

This is a simple version of a well known result and we present it only for comparison 
with the non-Gaussian case. The message of this result is that the squared norm of the 
Gaussian vector ^ concentrates around the value p and the deviation over the level 
p + y/xp are exponentially small in x . 

A similar bound can be obtained for a norm of the vector JB^ where JB is some 
given matrix. For notational simplicity we assume that JB is symmetric. Otherwise one 
should replace it with {JB^ IB)^/'^ . 

Theorem 2.2. Let ^ he standard normal in JW . Then for every x > and any 
symmetric matrix IB , it holds with p = tr(iB^) , = 2tr(iB^) , and a* = ||-/B^||oo 

lP{\\lB^f > p + (2vx^/2) V (6a*x)) < exp(-x). 
Below we establish similar bounds for a non-Gaussian vector ^ obeying (1.1). 
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3 A bound for the ^2 -norm 

This section presents a general exponential bound for the probability iP(||^|| > y) under 
(1.1). The main result tells us that if y is not too large, namely if y < yc with yc , 
then the deviation probability is essentially the same as in the Gaussian case. 

To describe the value yc , introduce the following notation. Given g and p , define 
the values wq = gp~^^'^ and Wc by the equation 



Wc{l + Wc) 



(3.1) 



It is easy to see that wq / 1/2 < Wc < wq . Further define 

xe = 0.5p[wl - log(l + w^c)] ■ (3.2) 

Note that for > p , the quantities yc and Xc can be evaluated as yc > w^p > g^/2 
and Xc>pw^j2 > g^/A. 

Theorem 3.1. Let ^ G fulfill (1.1). Then it holds for each x < Xc 

lP{Uf > p + {hx), II^II < yc) < 2exp(-x), 

where x = 6.6 . Moreover, for y > yc , it holds with gc = g — ^/^J'cP = gWc/ (1 + Wc) 

lP{m\ > y) < 8.4exp{-gcy/2 - (p/2)log(l - gc/y)} 
< 8.4exp{-Xc - gc(y - yc)/2}. 

The statements of Theorem 4.1 can be simplified under the assumption g^ > p ■ 
Corollary 3.2. Let ^ fulfill (1.1) and g^ > p . Then it holds for x < Xc 

iP(||^f >3(x,p)) < 2e-^ + 8.4e^=^=, (3.3) 



di^,p) = { (3.4) 

P + XX p/ X < X < Xc, 



with yc = 6.6 . For x > Xc 

iP(||^f > 3c(x,p)) < 8.4e-^ 3c(x,p) =^ |ye + 2(x - Xc)/gc|'. 
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This result implicitly assumes that p < xxc which is fulfilled if Wq = /p > 1 : 

xxc = 0.5}<[wl - log(l + wl)]p > 3.3[l - log(2)]p > p. 

For X < Xc , the function 3(x,p) mimics the quantile behavior of the chi-squared distri- 
bution Xp with p degrees of freedom. Moreover, increase of the value g yields a growth 
of the sub-Gaussian zone. In particular, for g = oo , a general quadratic form has 
under (1.1) the same tail behavior as in the Gaussian case. 

1 /2 

Finally, in the large deviation zone x > Xc the deviation probability decays as e"*^^ 
for some fixed c. However, if the constant g in the condition (1.1) is sufficiently large 
relative to p , then Xc is large as well and the large deviation zone x > Xc can be ignored 
at a small price of 8.4e~^'' and one can focus on the deviation bound described by (3.3) 
and (3.4). 

4 A bound for a quadratic form 

Now we extend the result to more general bound for = ^^]B^$, with a given 

matrix IB and a vector ^ obeying the condition (1.1). Similarly to the Gaussian case 
we assume that ]B is symmetric. Define important characteristics of JB 

p = tr{lB% v2 = 2tr{lB% X* \\B^^ Xm^A^B'). 

For simplicity of formulation we suppose that A* = 1 , otherwise one has to replace p 
and with p/A* and v^/A* . 

Let g be shown in (1.1). Define similarly to the £2 -case Wc by the equation 

Wc{l + Wc) _ _i/2 

(1 + u;2)i/2 - ■ 

Define also /j-c = w'^/^l + w'^) A2/3 . Note that wl>2 implies iic = '^/<^- Further define 

y2 = (1 + wl)^, 2xe = ^XcYl + logdet{/p - ^lcB^]. (4.1) 

Similarly to the case with IB = Ip , under the condition > p , one can bound > 
and Xc > gV4 ■ 

Theorem 4.1. Let a random vector ^ in JR^ fulfill (1.1). Then for each x < Xc 

lP{\\lB$f >p + (2vx^/2) V (6x), IliB^II < Yc) < 2exp(-x). 
Moreover, for y > jc , with gc = g — \/a*cP = gWc/ (1 + Wc) , it holds 
IP{\\Bi\\ > y) < 8.4exp(-xe - gc(y - yc)/2). 
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Now we describe the value ^(x, JB) ensuring a small value for the large deviation 
probability iP(||iB^f >^{x,]B)). For ease of formulation, we suppose that g2 > 2p 
yielding fi~^ < 3/2. The other case can be easily adjusted. 

Corollary 4.2. Let ^ fulfill (1.1) with > 2p . Then it holds for x < Xc with x^ from 
(4.1).- 

F{\\lB$f >i{x,]B)) < 2e-=^ + 8.4e-^% 



, def iP + 2vxV2, x<v/18, 
3(x,iB) = { (4.2) 
p + 6x v/18 < X < Xc. 



For X > Xc 

iP(||iB^ f > 3c(x, iB)) < 8.4e-^ 3c(x, B) = \y, + 2(x - Xc)/gc|'. 

5 Rescaling and regularity condition 

The result of Theorem 4.1 can be extended to a more general situation when the condition 
(1.1) is fulfilled for a vector ^ rescaled by a matrix Vq . More precisely, let the random 
p -vector fulfills for some p x p matrix Vq the condition 

sup log IE exp(X-^-\) < |A| < g, (5.1) 



with some constants g > , fo > 1 • Again, a simple change of variables reduces the case 
of an arbitrary > 1 to 1/0 = 1- Our aim is to bound the squared norm H-Dg^^ClP of a 
vector -D(7^C ^oi another pxp positive symmetric matrix Dq . Note that condition (5.1) 
implies (1.1) for the rescaled vector ^ = Vq'^C- This leads to bounding the quadratic 
form WDq^Vo^W^ = \\lB^f with iB^ = D^^V^D^^ . It obviously holds 

p = tr{lB^)=tr{Dfvi). 

Now we can apply the result of Corollary 4.2. 

Corollary 5.1. Let ^ fulfill (5.1) with some Vq and g. Given Dq , define ]B^ = 
Dq^Vq , and let g^ > 2p . Then it holds for x < Xc with Xc from (4.1).- 

IP{\\D^^Cf>l{^,JB)) < 2e-^ + 8.4e-^% 

with j{x,]B) from (4.2). For x > Xc 

PiWD^Hf > 3c(x, IB)) < 8.4e-^ 3c(x, B) = \y, + 2(x - Xc)/gc|'. 
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In the regular case with Dq > oVq for some a > 0, one obtams ||JB||oo < a and 

= 2tr(JB^) < 2a"2p. 



6 A chi-squared bound with norm-constraints 

This section extends the results to the case when the bound (1.1) requires some other 
conditions than the £2 -norm of the vector 7 . Namely, we suppose that 

logiEexp(7T^) < ||7||2/2, 7 G M^, ||7||o < go, (6.1) 

where || • ||o is a norm which differs from the usual Euclidean norm. Our driving example 
is given by the sup-norm case with ||7||o = II7II00. We are interested to check whether 
the previous results of Section 3 still apply. The answer depends on how massive the set 
A{r) = {7 : II7II0 < r} is in terms of the standard Gaussian measure on IR^ . Recall that 
the quadratic norm ||e|p of a standard Gaussian vector e in ]RP concentrates around 
p at least for p large. We need a similar concentration property for the norm || • ||o . 
More precisely, we assume for a fixed that 

P{Mo<n) >l/2, e~N(0,/p). (6.2) 

This implies for any value Uq > and all u € with ||u||o < Uq that 

P{\\£ - u||o < + Uo) > 1/2, e ~ 1^(0, Ip). 

For each 3 > p , consider 

IJ-id) = {l-p)/h- 
Given Uq , denote by 30 = 3o(uo) the root of the equation 

^° -Uo. (6.3) 



^(3o) Ai^/^(3o) 



as the largest 3 for which — ^^y*^^^ > Uo . Let //o = Ai(3o) be the corresponding 



One can easily see that this value exists and unique if Uo > go — ?"* and it can be defined 
as the largest 3 for which 
//-value. Define also Xo by 

2Xo = /Uo3o +p\-Og{l - /Uo) 

If Uo < go — , then set 30 = 00 , Xo = 00 . 
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Theorem 6.1. Let a random vector ^ in ]RP fulfill (6.1). Suppose (6.2) and let, given 
Uo , the value lo be defined by (6.3). Then it holds for any u > 

lP{Uf >P + u,U\\o<no) <2exp{-ip/2)cj)iu)]}. (6.4) 

yielding for x < Xo 

>P + V^V(xx), ll^llo <Uo) <2exp(-x), (6.5) 

where x = 6.6 . Moreover, for 3 > jo , it holds 

lP{Uf>}, ll^llo <Uo) < 2exp{-^o3/2- (p/2)log(l-/io)} 

= 2exp{-Xo - go(3 - 3o)/2}. 

It is easy to check that the resuh continues to hold for the norm of 77^ for a given 
sub-projector il in IRP satisfying 77 = 77^ , 77^ < 77 . As above, denote p '== tr(77^) , 
2tr(77^) . Let be fixed to ensure 

TPdlTTello < > 1/2, e ~ ^{0, Ip). 

The next result is stated for go > + Uo , which simplifies the formulation. 

Theorem 6.2. Let a random vector ^ in IBP fulfill (6.1) and U follows IJ = , 
77^ < 77 . Let some Uq be fixed. Then for any /^o < 2/3 with gofJ-o^ ~ f*lJ'o ^^'^ > Uq , 

7E;exp{^(||77^f - p)} ]l(||772^||o < Uo) < 2exp(/iy/4), (6.6) 

where = 2tr(77^) . Moreover, if go > + Uq , then for any 3 > 

7P(||7r^f >3,||772^||o <Uo) 

< Fi^Llif > p + (2vx^/2) V (6x), \Tl'^i\o < Uo) < 2exp(-x). 

7 A bound for the ^2 -norm under Bernstein conditions 

For comparison, we specify the results to the case considered recently in Baraud (2010). 
Let ^ be a random vector in 7??"' whose components d are independent and satisfy the 
Bernstein type conditions: for all |A| < 

log7Ee^f'<-^^. (7.1) 
1 - c A 
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Denote ^ = C,/{2a) and consider ||7||o = IItIIoo • Fix go = a/c. If ||7||o < go, then 
1 - C7i/(2c7) > 1/2 and 

i i 

Let also 5 be some linear subspace of JR" with dimension p and 11$ denote the 
projector on S . For applying the result of Theorem 6.1, the value r* has to be fixed. 
We use that the infinity norm ||e||oo concentrates around ^/2\ogp . 

Lemma 7.1. It holds for a standard normal vector e € JRP with = -y/2 logp 

P{\\£\\o<r,) > 1/2. 

Proof. By definition 



JP(lkl|o > r,) < iP(||£||oo > V21og^) < p]P{\ei\ > V2Togp) < 1/2 
as required. □ 

Now the general bound of Theorem 6.1 is applied to bounding the norm of • 
For simplicity of formulation we assume that go > Uq + . 

Theorem 7.2. Let S be some linear subspace of iR" with dimension p . Let go > 
Uo + r* . If the coordinates Q of C o^re independent and satisfy (7.1), then for all x, 

JP((4a2)-i||7l5Cf >P + V^V(xx), Ilil^Clloo < 2aUo) < 2exp(-x), 

The bound of Baraud (2010) reads 

Ip(^\\nsC\\2 > (3cT V V6^^/^+3^, Ilil^Clloo < 2aUo^ < e-^ 

As expected, in the region x < Xc of Gaussian approximation, the bound of Baraud is 
not sharp and actually quite rough. 

A Proof of Theorem 2.1 

The proof utilizes the following well known fact: for < 1 

logiE;exp(;Li||^||V2) = -0.5plog(l - /i). 
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It can be obtained by straightforward calculus. Now consider any n > . By the 
exponential Chebyshev inequality 

P{Uf>P + u) <exp{-fi{p + u)/2}lEe^p{fiUf/2) (A.l) 
= exp{-/i(p + n)/2-(p/2)log(l-Ai)}. 

It is easy to see that the value = u/{u + p) maximizes + u) + plog(l — /u) w.r.t. 
/i yielding 

fj,{p + u) — plog(l — fi) = u — plog(l + u/p). 

Further we use that x — log(l + x) > aox^ for x < 1 and x — log(l + x) > a^x for x > 1 
with oq = 1 — log(2) > 0.3 . This implies with x = u/p for u = y/>cxp or u = xx and 
X = 2/ao < 6.6 that 

iP(||^|P >P+ V^<xp V (xx)) < exp(-x) 

as required. 

B Proof of Theorem 2.2 

The matrix IB^ can be represented as C/^ diag(ai, . . . ,ap)U for an orthogonal matrix 
U . The vector ^ = U$, is also standard normal and = ^ UJB^U^ $, . This means 

that one can reduce the situation to the case of a diagonal matrix = diag(ai, . . . , Op) . 
We can also assume without loss of generality that ai > a2 > . . . > Op . The expressions 
for the quantities p and simplifies to 

p = tr(iB^) = ai + . . . + Op, 
= 2tr(iB^) = 2{al + . . . + a^). 

Moreover, rescaling the matrix IB^ by ai reduces the situation to the case with oi = 1 . 
Lemma B.l. It holds 

lE\\]B$,f = tr(iB2), \ai[\\B$,f) = 2iv{B'^). 

p 

det(l - ;[iiB2)-V2 = -Q(i _ ^a.)-i/2. (b.i) 

i=l 



Moreover, for fi < 1 

IE exp{^t\\lB$,f /2} = 
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Proof. If IB^ is diagonal, then = OjCf s-i^d the summands ai^'^ are indepen- 

dent. It remains to note that lE{ai^f) = Oj , Var(aj^^) = 2a? , and for /iOj < 1 , 

lEexp{fiai^f/2} = (1 - ^aO"^/^ 

yielding (B.l). □ 

Given u , fix /i < 1 . The exponential Markov inequality yields 



lP{\mf>p + u) < exp{- ^^P + ^^ }iEexp 



u 1 ^ 

i=l 



We start with the case when < w/3. Then u = 2x^/'^v fulfills u < 2v'^/3. Define 
^ = n/v^ < 2/3 and use that t + log(l - t) > -t^ for t < 2/3 . This implies 



lP{\\Bif>^ + u) 

^ exp{-f + =""pHV(4v^)) =e-^ (B.2) 

1=1 

Next, let x^/^ > f /3 . Set = 2/3 . It holds similarly to the above 

p p 
^[fiai + log{l - nai)] > -^^^a,^ > -2vV9 > -2x. 

i=l i=l 

Now, for u = 6x and fiu/2 = 2x , (B.2) implies 

lP{\\lB$f >p + u) < exp{-(2x-x)} = exp(— x) 

as required. 



C Proof of Theorem 3.1 

The main step of the proof is the following exponential bound. 
Lemma C.l. Suppose (1.1). For any fi < 1 with > pfj, , it holds 

iEexp(^) l(||^|| < g//i - vW^) < 2(1 - m)-^'/^ (C.l) 

Proof. Let £ be a standard normal vector in and u € . The bound iP(||£|p > 
p) ^ 1/2 implies for any vector u and any r with r > \\u\\ that JP(||it + e|| < 
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r) > 1/2. Let us fix some ^ with ||^|| < g//i— \I'pI and denote by tlie conditional 
probability given ^ . It holds with Cp = (27r)~^/^ 

/II l|2 
exp(7^^-^) 1I(||7|| < g)d7 

= c,exp(;x||^||V2) / exp(-i||/.-V2^ _ ^i/2^||2) ]I(^-V2||^|| < ^-i/2g)^^ 

= /z^'/2exp(/.||^||V2)iPai|e + A^i/^^|| <A^-^/^g) 
> 0.5/iP/2exp(^||^||V2), 

because < /x^-'^/^g. This implies in view of p < g^//i that 

exp(^||^|| 2/2) < g//i - V^) 

/II l|2 
exp(7^^ -^) 1(11711 <g)d7- 

Further, by (1.1) 

CpiE y" exp(7^^ - ^hf ) 2(11711 < 

<cpj exp(-^^I^||7f ) II(||7ll < g)dl 

< Cp j exp(- ^ ^ ^ ll7lP)rf7 

< _ l)-p/2 



and (C.l) follows. □ 



Due to this result, the scaled squared norm ;u||^|p/2 after a proper truncation pos- 
sesses the same exponential moments as in the Gaussian case. A straightforward impli- 
cation is the probability bound JP(||^|p > p + u) for moderate values u. Namely, given 
n > , define ji = u/{u + p) ■ This value optimizes the inequality (A.l) in the Gaussian 
case. Now we can apply a similar bound under the constraints ||^|| < g/^ — \Jpl\i . 
Therefore, the bound is only meaningful if ^Ju -|- p < g//i — \/p/Ji with fi = u/{u+ p) , 
or, with w = \J u/p < Wc ', see (3.1). 

The largest value u for which this constraint is still valid, is given by p + u = y'^ . 



14 



SHARP DEVIATION BOUNDS FOR QUADRATIC FORMS 



Hence, (C.l) yields for p + u <y1 

]P(Uf->P + U,\\(\\<Jc) 

<exp{-^*±^}Eexp(^i«|«!)ll(||«||<g/,- 

< 2 exp{— 0.5 [fi{p + u) + plog(l — ;u)] } 
= 2exp{— 0.5[n — plog(l + u/p)] }. 

Similarly to the Gaussian case, this implies with x = 6.6 that 

iP(||^|| >p + V^V(xx),||^|| <yc) < 2exp(-x). 

The Gaussian case means that (1.1) holds with g = oo yielding y^ = oo . In the non- 
Gaussian case with a finite g , we have to accompany the moderate deviation bound with 
a large deviation bound iP(||^|| > y) for y > yc • This is done by combining the bound 
(C.l) with the standard slicing arguments. 



Lemma C.2. Let fiQ < g^/p ■ Define yo = g/fiQ- \/p/1m) and go = ^^070 = g - ^/JM)P ■ 
It holds for y > yo 

P{m\>y) < 8.4(1 -go/y)-P/%xp(-goy/2) (C.2) 
< 8.4exp{-xo - go(y - yo)/2}. (C.3) 

with xo defined by 

2x0 = /^oYo +l'log(l - Mo) = gV^^o -P + plog(l - ^o)- 

Proof. Consider the growing sequence y^ with yi = y and goyfc+i = goy + k . Define 
also fik = go/yfc • In particular, /Xfc < /xi = go/y . Obviously 

oo 

^(ll^ll>y) =Y.^M>ykAm<yk+i). 

k=l 

Now we try to evaluate every slicing probability in this expression. We use that 

2 (goy + A; - 1)^ 

tik+ilk = — T > goy + A: - 2, 

Eay + k 

and also g/^Ufc - \fpf^k > Jk because g - go = ^/jl^ > ^/Jlj^ and 
g/z^fc - Vp/fJ'k - y/c = MfcHg - VJj^ - go) > 0. 
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Hence by (C.l) 

oo 

^(ii^ii >y) ^ >y'^'ii^ii ^y-^+i 



k=l 



k=l 

oo 2 



< ^2(l-/xfc+i) ^/'exp 

k=l 

<2(l-/ii) 2^exp(^ ^ ^ 

= 2eV2(i _ e-V2)-i(i _ ;,i)-p/2 exp(-goy/2) 

< 8.4(1 -^i)"P/2exp(-goy/2) 

and the first assertion follows. For y = yo ; it holds 

goyo + plog(l - /io) = /^oYo + plog(l - l^o) = 2x0 

and (C.2) implies iP(||^|| > yo) < 8.4exp(— xq) . Now observe that the function /(y) = 
goy/2 + (p/2)log(l - go/y) fulfills /(yo) = xo and /'(y) > go/2 yielding /(y) > 
xo + go(y - yo)/2 . This implies (C.3). □ 

The statements of the theorem are obtained by applying the lemmas with /xq = /Uc = 
w'^/{l + wl) . This also implies yo = Yc , xq = Xc , and go = gc = g - y/Jj^ ] cf. (3.2). 

D Proof of Theorem 4.1 

The main steps of the proof are similar to the proof of Theorem 3.1. 
Lemma D.l. Suppose (1.1). For any fi < 1 with g^/fi > p , it holds 

]Eexp{fi\\]B^f/2) m\lB^^\\ < g//. - < 2det(/p - fiB^'/^. (D.l) 

Proof. With Cp{lB) = (2^)"^/^det(iB-i) 

c,{lB) I exp(7^^ - ^\\lB-'lf) ]I(||7|| < s)d-f 

= c,(iB)exp(^^^) I exp(-l||/.V2jB^_^-i/2^-i^||2) < g)^^ 

= /.^/^Xp(^^M^)^^(||/,-l/2^, + ^2^|| < g/^), 
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where e denotes a standard normal vector in and JP^ means the conditional prob- 
ability given $, . Moreover, for any u € i??^ and r > p^/^ + ||w|| , it holds in view of 
lP{\\lBef > p) < 1/2 

IP{\\lBe - u\\ < r) > lP{\\lBe\\ < ^/p) > 1/2. 

This implies 

exp(M||iB^||V2) HWlB^iW < g/fi - vW^) 

< 2/x-f/%(JB) J exp(7T^- ]I(||7|| < g)dj. 

Further, by (1.1) 

Cp{lB)lE I exp(7T^ - i-lliB-^f ) 1(11711 < s)dl 

f II l|2 1 

< CpiB) J exp(^ - —\\B-^jfy-f 

< det(iB~i)det(^-iiB-2 _ i^yV^ = ^p/^ det{Ip - fiB^^/^ 

and (D.l) follows. □ 
Now we evaluate the probability -fdl^B^H > y) for moderate values of y. 



Lemma D.2. Let /io < 1 A (g^/p) . With yo = g/fJ-o — yV/lM) , it holds for any u > 

iP(||iB^f >p + n,||iB2^|| <yo) 

< 2exp{-0.5/io(p + 'u) -0.51ogdet(/p-^oiB^)}- (D.2) 

In particular, if JB^ is diagonal, that is, JB^ = diag(ai, . . . , Op) , then 

]P{\\B^f >^ + u,\\B^i\\<yo) 
1 ^ 

< 2exp|-^ - -^[;Uoai + log(l - ^oOi)]}- (^•^) 

i=l 

Proof. The exponential Chebyshev inequality and (D.l) imply 

P{\\B^f >^ + u,\\B^^\\<ya) 

< exp{-^^^^}iEexp(^^«^) miB^^W < g/ - 

< 2exp{-0.5/io(p + 'u) - 0.51ogdet(/p - /io^B^)}. 
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Moreover, the standard change-of-basis arguments allow us to reduce the problem to the 
case of a diagonal matrix IB^ = diag(ai, . . . , a.pj where 1 = oi > 02 > . . . > > . 
Note that p = ai + . . . + . Then the claim (D.2) can be written in the form (D.3). □ 

Now we evaluate a large deviation probability that > y for a large y. Note 

that the condition ||iB^||oo < 1 implies < . So, the bound (D.2) continues 

to hold when < yo is replaced by \\1B^\\ < yo . 

Lemma D.3. Let fiQ < 1 and fiop < . Define go by go = g — ^/JIop ■ For any 
y > yo '= go//^o ; it holds 

P{\m\\ > y) < 8.4det{/p - (go/y)s2|-i/2g^p(_g^y/2). 

< 8.4exp(-xo-go(y-yo)/2), (D.4) 

where xg is defined by 

2x0 = goyo + logdet{/p - (go/yo)-^^}. 

Proof. The slicing arguments of Lemma C.2 apply here in the same manner. One has 
to replace ||^|| by ||iB^|| and (1 - /ii)-^/^ by det{/p - (go/y)iB2}-V2 . We omit the 
details. In particular, with y = yo = go/A* > this yields 

>yo) < 8.4exp(-xo). 

Moreover, for the function /(y) = goy + logdet{/p — (go/y)iB^} , it holds /'(y) > go 
and hence, /(y) > /(yo) + go(y - yo) for y > yo • This implies (D.4). □ 

One important feature of the results of Lemma D.2 and Lemma D.3 is that the value 
;Uo < 1 (g^/p) can be selected arbitrarily. In particular, for y > yc , Lemma D.3 
with fio = fic yields the large deviation probability JP(||JB^|| > y) . For bounding the 
probability iP(||JB^||2 > p + n, \\]B$\\ < y^) , we use the inequality log(l — t) > —t — t 
for t < 2/3 . It implies for /z < 2/3 that 

-logiP(||iB$f >p + n,||iB^|| <ye) 
p 

> /i(p + n) + ^ log(l - nai) 

i=l 
P 

> /x(p + n) - ^(/iai + ii^a"^) > fxu - jl. (D.5) 
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Now we distinguish between = 2/3 and fic < 2/3 starting with fic = 2/3. The 
bound (D.5) with fi = 2/3 and with u = (2vx^/2) y (-g^) yields 

]P{\\]B^f >p + u, IliB^II <yc) < 2exp(-x); 

see the proof of Theorem 2.2 for the Gaussian case. 

Now consider /Xc < 2/3. For x^^"^ < /icv/2, use u = 2vx^^'^ and /xq = n/v^ . It 
holds HQ = u/v^ < He and m^/(4v^) = x yielding the desired bound by (D.5). For 
x^/^ > /ic'v/2 , we select again /xq = /Xc • It holds with u = Afi~^x that /Xc^i/2 — /U^v^/4 > 
2x — X = X . This completes the proof. 

E Proof of Theorem 6.1 

The arguments behind the result are the same as in the one- norm case of Theorem 3.1. 
We only outline the main steps. 

Lemma E.l. Suppose (6.1) and (6.2). For any fi < 1 with go > fi^^'^r^ , it holds 

iEexp(/x||^||V2) Idl^llo < go/H - r.///2) < 2(1 - ^r^/^ (E.l) 

Proof. Let e be a standard normal vector in ]R^ and u G JW . Let us fix some ^ with 
/^^^^ll^llo < Ai~^^^go — r* and denote by the conditional probability given ^ . It holds 
by (6.2) with Cp = (27r)^P/2 

CpJ exp(7^^- ^||7||2) ]I(||7||„ < g„)d7 

= c,exp(/x||^||V2) I exp(-i||/xV2^ - M^^/^f ) H^-'^hWo < H-"ho)d^ 

= /x^'/2exp(/x||^||V2)iPai|e-/xV2^||o </x-V2g„) 
exp(;u||^|| /2). 

This implies 

expf^)ll(||^||o<go//i-r,///2) 



2 

< 



2h~^/^CpI exp(7"^^ - ^\hf) mhWo < go)d-f. 
Further, by (6.1) 

CpJE j exp(7T| - ^Wlf) m\l\\o < go)d7 

<cp [ expf-^^4^||7|pV7 < in-' - ir^/' 
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and (E.l) follows. □ 

As in the Gaussian case, (E.l) implies for ^ > p with fi = fi{^) = (3— the bounds 
(6.4) and (6.5). Note that the value ^(3) clearly grows with 3 from zero to one, while 
Eo/f^id) — is strictly decreasing. The value 30 is defined exactly as the point 

where go/Kd) - crosses Uo , so that go/f^{d) - r^/^j^/'^{i) > Uo for 3 < 3o • 

For 3 > 3o , the choice fi = fi{y) conflicts with go//^i(3) — r^/ ^^/"^{i) > Uq . So, we 
apply fi = Ho yielding by the Markov inequality 

lP{Uf>i, ll^llo <Uo) <2exp{-//o3/2-(p/2)log(l-//o)}, 
and the assertion follows. 

F Proof of Theorem 6.2 

Arguments from the proof of Lemmas D.l and E.l yield in view of go/i^^ — r^fio > Uq 

iEexp{/io||iI^||V2} 2(ll^^^l|o < Uo) 

< iEexp(^o||iI^||V2) min'^^o < go//io - pZ/^y') 

< 2det{Ip - ^loH^)'^/^ . 

Now the inequality log(l — t) > —t — t'^ for t < 2/3 implies 

- log det(/p - HolT^) < /J-oP + /UoV^/2 
cf. (D.5); the assertion (6.6) follows. 
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